-
Notifications
You must be signed in to change notification settings - Fork 37
LLM qualifier #435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
LLM qualifier #435
Conversation
Docs preview URL |
Coverage Report
Files without new missing coverage
283 files skipped due to complete coverage. Coverage success: total of 97.98% is above 97.94% 🎉 |
d2e1f39 to
65669dc
Compare
9a55bad to
f3ac94b
Compare
|
Hi @aricohen93, great work ! I took the liberty to make some small/large modifications. Here's a summary
|
a5f65cb to
ecaf893
Compare
…e schema w/ pydantic
ecaf893 to
5ba1346
Compare
|
| from pydantic import BeforeValidator, PlainSerializer, WithJsonSchema | ||
| import edsnlp, edsnlp.pipes as eds | ||
| BiopsySchema = Annotated[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@percevalw we should clarify this example as discussed.
Maybe also inverse the order of examples, the more basic example in first place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@percevalw
Add also an example of Typing with a TypedDict as follows :
from typing import TypedDict
class BiopsySchemaDict(TypedDict):
biopsy_procedure: bool
....
eds.llm_span_qualifier(
attributes=["biopsy_procedure"], # When the output_schema is a TypedDict the attributes specification is mandatory|
@percevalw from spacy.tokens import Doc
class ContextFormatter:
def __init__(self, prefix: str, suffix: str):
self.prefix = prefix
self.suffix = suffix
def __call__(self, context: Doc) -> str:
span = context.ents[0].text if context.ents else ""
prefix = self.prefix.format(span=span)
suffix = self.suffix.format(span=span)
return f"{prefix}{context.text}{suffix}"
context_formatter = ContextFormatter(prefix="\n## Context\n\n<<<\n", suffix= "\n>>>\n\n## Instruction\nDoes '{span}' corresponds to a Biopsy date?") |
|



Description
The
LLMSpanClassifiercomponent is a LLM attribute predictor.Tutorial :
https://edsnlp-llm-qualifier.vercel.app/llm-qualifier/tutorials/qualifying-entities-with-llm/
Two main classes are proposed:
LLMSpanClassifiercomponent is a LLM attribute predictor.AsyncLLMclass is a helper to interact asyncronuously with an external llm API. It receives a list of messages (from edsnlp or not) and it returns the answer.The tests are also structured in two main parts.
Checklist